Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation

Calibrating model parameters on heterogeneous data can be challenging and inefficient. This holds especially for likelihood-free methods such as approximate Bayesian computation (ABC), which rely on the comparison of relevant features in simulated and observed data and are popular for otherwise intractable problems. To address this problem, methods have been developed to scale-normalize data, and to derive informative low-dimensional summary statistics using inverse regression models of parameters on data. However, while approaches only correcting for scale can be inefficient on partly uninformative data, the use of summary statistics can lead to information loss and relies on the accuracy of employed methods. In this work, we first show that the combination of adaptive scale normalization with regression-based summary statistics is advantageous on heterogeneous parameter scales. Second, we present an approach employing regression models not to transform data, but to inform sensitivity weights quantifying data informativeness. Third, we discuss problems for regression models under non-identifiability, and present a solution using target augmentation. We demonstrate improved accuracy and efficiency of the presented approach on various problems, in particular robustness and wide applicability of the sensitivity weights. Our findings demonstrate the potential of the adaptive approach. The developed algorithms have been made available in the open-source Python toolbox pyABC.


Summary
The authors have now responded to my comments, and their revisions have significantly improved the paper. I have a few remaining minor comments I would like the authors to look at before finalizing the paper.
We thank the reviewer for their previous comments and their assessment. We provide a response to the remaining minor comments below.
2 General Comments Previous Comment (1): Differentiability Issues. In regards to my previous comments about the differentiability issues, I was referring to the differentiability of the summaries specifically in the case of finite-samples. That is, my comment was specifically about the sensitivity matrix defined in (3), which is a derivative, wrt the data y, of the summary mapping s : R dy → R d θ . In the case of sample quantiles, yes, this map is differentiable in expectation, but the resulting map s(y) is not differentiable for any finite-sample. Moreover, employing such an estimator will ultimately deliver a biased estimator for the derivative you wish to approximate. The authors have not adequately addressed this issue in the text, as far as I can see, and their response to my previous comment does not solve the issue above. This lack of exchange between differentiation and integration in finite-samples should be acknowledged as a limitation of the method.
We thank the reviewer for this clarification. We have extended the discussion to acknowledge this limitation of the method: Initial submission: It should be noted that in calculating sensitivity weights we make the assumption that the employed regression model is (sub-)differentiable, which clearly holds for the regression models employed in this work (see also the Supplementary Information, Section 3). Further, we implicitly assume that the employed smooth regression model provides an accurate approximation of the underlying expectations it aims to describe. In practice, it should be carefully evaluated whether such an approximation appears valid for the given forward model.
Revised submission: It should be noted that in calculating sensitivity weights we make the assumption that the employed regression model is (sub-)differentiable, which clearly holds for the regression models employed in this work (see also the Supplementary Information, Section 3). Further, we implicitly assume that the employed smooth regression model provides an accurate approximation of the underlying expectations it aims to describe. In practice, it should be carefully evaluated whether such an approximation appears valid for the given forward model. In the first place, we use the derived sensitivity matrix as a "heuristic" to quantify informativeness for weighting. However, it may fail in this function as a heuristic if the underlying quantities are inadequately captured. In particular, it provides a biased approximation of the derivative it intends to approximate when in the underlying expectations integration and differentiation cannot be exchanged in finite samples. Studying the consequences of such conceptual and practical discrepancies as well as investigating alternative measures of sensitivity could improve robustness and applicability of the method.
Previous comment (5): Inverse Mapping. Frazier et al. (2018) and Li and Fearnhead (2018) do not use the nomenclature 'inverse mapping' due to the fact that the standard name for the mapping θ → s(y), where y ∼ p(·|θ), is the (sample) binding function, which comes from a long line of research in simulated frequentist statistics (Indirect Inference more precisely). Nonetheless, this is precisely the inverse mapping that matters for the statistical analysis as it allows you to recover/identify θ by 'inverting' the limit version of this map. The nonidentifiablity, i.e., the lack of a unique inverse mapping in the authors nomenclature, appears to be the culprit behind this particular issue with the combination of model and regression summaries. In that way, I am wondering if this is a break-down of the identification conditions employed in the theoretical study of ABC and other simulated methods, such as Assumption 3(ii) in Frazier et al. (2018), and 4(ii) in Li and Fearnhead (2018). Please clarify in the text this issue.
We thank the reviewer for this comment. To clarify our meaning of "inverse", we have added the following to the introduction: Changes in the manuscript: Initial submission: In a popular class of approaches, inverse regression models of parameters on simulated data have been used as statistics [Borowska et al., 2021, Fearnhead and Prangle, 2012, Jiang et al., 2017.
Revised submission: In a popular class of approaches, inverse regression models of parameters on simulated data have been used as statistics [Borowska et al., 2021, Fearnhead and Prangle, 2012, Jiang et al., 2017. Here, by "inverse" we mean that the summary statistics map from (functions of) simulated data back to (functions of) the parameters, i.e. in the inverse direction to the forward mechanistic model. Further, we have clarified the identifiability issue and put it into context of the mentioned theoretical work, by the following discussion in the methods section: Initial submission: -Revised submission: More generally, an informative inverse mapping from data back to parameters can only exist when the forward model θ → π(·|θ) is injective, i.e. the model is structurally identifiable in the limit of infinite data [Lehmann andCasella, 2006, Raue et al., 2009]. We would like to remark that this lack of identifiability of true parameters given data also stands in the way of theoretical asymptotic results obtained for ABC methods regarding convergence of point estimators in the large-data limit in Li and Fearnhead [2018] (Condition 4.(ii)) and limiting shape of posteriors and posterior means in Frazier et al. [2018] (Assumption 3.(ii)).

Response to Reviewer 2
I am very pleased with the responses by the authors. All my questions and comments have been addressed. I believe this will be a very useful paper to the systems biology community, and statistical computing more broadly.